Setting Data set

This document would once more time present the results of the HomeWork-1 but in a more formal way.

The given data set was conducted during the 8-th wave of the ESSS reserch. It includes the variables, that measure importance of various life values of people living in Spain.

Life values variables are coded from the most positive to the most negative. For the convenience of interpretation, the encoding was performed.

Description of the Variables

The variables that would be used in the following analysis are the next ones

IPSTRGV - Important that government is strong and ensures safety IPEQOPT - Important that people are treated equally and have equal opportunities IMPSAFE - Important to live in secure and safe surroundings IPFRULE - Important to do what is told and follow rules IPMODST - Important to be humble and modest, not draw attention IPBHPRP - Important to behave properly

In the course of further analysis, it will be considered how in Spain the value of a strong government can be related to other life values.

In the course of further analysis, the following hypotheses will be tested:

  1. The value of a strong goverment (IPSTRGV) has a strong positive relationship with the value of equal treatment of all people and the availability of equal opportunities for all (IPEQOPT).

  2. The value of a strong state (IPSTRGV) has a strong positive relationship with value safe environments (IMPSAFE).

  3. The value of a strong state (IPSTRGV) has a strong positive relationship with the value follow the accepted rules (IPFRULE).

  4. The value of a strong state (IPSTRGV) has a strong positive relationship with value be modest and secretive, do not attract attention (IPMODST).

  5. The value of a strong state (IPSTRGV) has a strong positive relationship with the value behave correctly (IPBHPRP).

  6. The value of following the accepted rules (IPFRULE) and the value of behaving correctly (IPBHPRP) by their content are strongly interrelated and together can have a strong connection with value of a strong state (IPSTRGV).

To confirm or refuse the hypotheses put forward, we will use linear regression. To test hypotheses 1-5, it is necessary to construct multiple linear regression. To check the Hypothesis 6 it is necessary to perform on a similar regression with the interaction effect of predictors.

In addition to the above independent variables, some socio-demographic characteristics of respondents have been also put in the model, such as gender (GNDR), age (AGEA) and type of settlement (DOMICIL).

Before testing the hypotheses and constructting regressions, we will analyze descriptive statistics of the dependent variable. We will look at the distribution of the answers.

Chart 1

The Legend of the Importance that Government is Strong and Ensures Safety

1 Not like me at all 2 Not like me 3 A little like me 4 Somewhat like me 5 Like me 6 Very much like me

That histogram was chosen to show that type of connection because it is perfectly fittable for it. It allows us to see what answers were given more often in comparison to others.

Based on it we can say that the most amount of answers given belongs to 6 - Very much like me. The distribution is exponentional: 1 “Not like me at all” has the smallest amount of answers and 6 - the largest one.

Linear Regression

Next, proceed with constructing a regression model. The following table demonstrates us the parameters for regression

term estimate std.error statistic p.value
(Intercept) 1.43 0.20 6.97 0.00
ipeqopt.recoded 0.10 0.03 3.26 0.00
impsafe.recoded 0.37 0.02 15.80 0.00
ipfrule.recoded 0.12 0.02 6.80 0.00
ipmodst.recoded 0.05 0.03 2.02 0.04
ipbhprp.recoded 0.10 0.02 4.41 0.00
agea 0.00 0.00 1.20 0.23
as.factor(domicil)2 -0.04 0.12 -0.38 0.70
as.factor(domicil)3 -0.02 0.07 -0.29 0.77
as.factor(domicil)4 -0.08 0.06 -1.18 0.24
as.factor(domicil)5 -0.10 0.16 -0.62 0.54

That regression mode is fittable for testing hypotheses 1-5. Before proceeding with the interpretation, we will check the assumptions for multicollinearity and homoskedasticity

GVIF Df GVIF^(1/(2*Df))
ipeqopt.recoded 1.079043 1 1.038770
impsafe.recoded 1.166511 1 1.080051
ipfrule.recoded 1.217939 1 1.103603
ipmodst.recoded 1.188306 1 1.090095
ipbhprp.recoded 1.322491 1 1.149996
agea 1.081735 1 1.040065
as.factor(domicil) 1.027051 4 1.003342

Since the values of all variables are close to 1, the independent variables are not multicollinear

## 
##  studentized Breusch-Pagan test
## 
## data:  lreg1
## BP = 154.84, df = 10, p-value < 2.2e-16

Since the test is statistically significant, the data is characterized by heteroscedasticity. Consequently, the data is scattered too far from the constructed regression line, the model is not relevant for testing hypotheses. However, since the purpose of this assignment is not to get the corresponding reality of the conclusions, then the model will be interpreted.

The equation of the regression model is as follows:

Y = 1.43 + 0.09ipeqopt.recoded + 0.37impsafe.recoded + 0.12ipfrule.recoded + 0.05ipmodst.recoded + 0.1*ipbhprp.recoded

Interpretation of the model is the following (at the level of 95% statistical probability):

The model is significant (2.2e-16 <0.05);

The coefficient of determination is 0.24, hence the model explains 24% variance dispercion;

In the constructed model, the variables of vital values are significant, variables of socio-demographic characteristics are insignificant.

Creating the Behaviour Variable

In order visualise the data I will not choose the dependent variable I used in the regression model. That variable is a 6-scale ordinal variable and it doesn’t let me to visualise the data in a proper and understandable way.

In order to build the needed graphs 2 and 3, I will use a part from my HomeWork 2. I will take one factor, Behavior (one of the strong predictors of that factor is the dependent variable of linear regression I built in the HomeWork1 - IPSTRGV - Important that government is strong and ensures safety). In order to visualise the data in a readible and a understandble way, I constructed the variable Behaviour, based and the factor from the HomeWork-2. That variable is total sum of the predictors conducting the facor Behaviour

(Behaviour <- ipbhprp.recoded+ipstrgv.recoded+imptrad.recoded+iprspot.recoded)

As a result of the previous operation, I created the variable Behaviour, which distribution is from 4 for 24. The variable’s meaning for each respondent is following:

Scale from 4 to 8 = Behaviour isn’t very important

Scale from 9 to 13 = Behaviour isn’t important

Scale from 14 to 18 = Behaviour is important

Scale from 19 to 24 = Behaviour is very important

That is the way I decided to get a variable with a larger distribution. I understand that the validity of that variable is not tested, but I recognize that the prime task of that HomeWork is not to get a realistic conclusions of the reaity but to demonstrate my ability to creat different kinds of charts and visualise the data.

Graph 2

The graph 2 is the following one.

That kind of chart was chosen for several reasons. First of all, it allows us to see the distribution of the dependent variable and independet variables. Also, it lets us to se the connection between them. According to that chart, with age the importance of the behavior actually grows. It can be explained by the idea that with time and age and experience people realize the importance of that life value.

Thus, the value of a strong state has a strong positive relationship with value safe environments and less strong with the rest of the values in the model. Therefore,with the help of the constructed model can confirm hypotheses 1-5, but taking into account that the data is heteroscedastic.

Graph 3

The graph 3 is the following one. It has interactive effects.

The legend of the variables is following:

Gender:

  1. Male
  2. Female

Domicile:

  1. A big city
  2. The suburbs or outskirts of a big city
  3. A town or a small city
  4. A country village
  5. A farm or home in the countryside

The type of chart was chosen because it lets us to visualise the links and conections and the respondets themselves. There we can see that for the younger respondents the importance of the Behavior has broader distribution. People of the age of about 60 and more believe the behaviour to be a more important life value.

Creating the Regression with Interaction Effect

Now hypothesis 6 will be tested. To test it, a similar regression model will be applied, but with the predictors interaction effect included in it.

term estimate std.error statistic p.value
(Intercept) 1.35 0.27 4.95 0.00
ipeqopt.recoded 0.10 0.03 3.28 0.00
impsafe.recoded 0.37 0.02 15.77 0.00
ipfrule.recoded 0.14 0.06 2.46 0.01
ipmodst.recoded 0.05 0.03 2.00 0.05
ipbhprp.recoded 0.11 0.04 2.54 0.01
agea 0.00 0.00 1.21 0.23
as.factor(domicil)2 -0.04 0.12 -0.38 0.71
as.factor(domicil)3 -0.02 0.07 -0.29 0.77
as.factor(domicil)4 -0.08 0.06 -1.17 0.24
as.factor(domicil)5 -0.10 0.16 -0.62 0.53
ipfrule.recoded:ipbhprp.recoded -0.01 0.01 -0.43 0.67

The interaction effect in this model is “ipbhprp.recoded: ipfrule.recoded”.

According to this model, the interaction effect is not statistically significant (0.67> 0.05). Based on the constructed model, one can refuse the hypothesis about the interaction of values to follow accepted rules and behave correctly.

Creating the Ordinal Regression

Further, for the secondary testing of hypotheses 1-5, an ordinal regression will be constructed.

term estimate std.error statistic coefficient_type
impsafe.recoded 0.72 0.05 15.22 coefficient
ipfrule.recoded 0.18 0.03 5.50 coefficient
ipmodst.recoded 0.22 0.05 4.59 coefficient
ipbhprp.recoded 0.20 0.04 4.93 coefficient
1|2 1.63 0.33 4.93 zeta
2|3 2.97 0.30 9.75 zeta
3|4 3.89 0.30 12.75 zeta
4|5 4.93 0.31 15.83 zeta
5|6 6.96 0.33 20.89 zeta
  1. One unit increase in “impsafe.recoded” the odds of value 6 of “ipstrgv.recoded.f” versus values 1-5 of “ipstrgv.recoded.f” combined are 2.05 greater.

  2. One unit increase in “ipfrule.recoded” the odds of value 6 of “ipstrgv.recoded.f” versus values 1-5 of “ipstrgv.recoded.f” combined are 1.2 greater.

  3. One unit increase in “ipmodst.recoded” the odds of value 6 of “ipstrgv.recoded.f” versus values 1-5 of “ipstrgv.recoded.f” combined are 1.25 greater.

  4. One unit increase in “ipbhprp.recode” the odds of value 6 of “ipstrgv.recoded.f” versus values 1-5 of “ipstrgv.recoded.f” combined are 1.23 greater.

Based on comparing of the two constructed model, the results are analogical. The strongest influence on the importance of the strong governemt has the value of the safe surroundings.